Official code release of our work, Summarize and Generate to Back-translate: Unsupervised Translation of Programming Languages.
Setup • Train • Evaluation • License • Citation
Setting up a conda environment is recommended to run experiments. We assume anaconda is installed. The additional requirements (noted in requirements.txt) can be installed by running the following script:
bash install_env.sh
Then build tree_sitter library for Java and Python languages by running:
python build.py
Finally, download the pre-trained PLBART checkpoints.
cd plbart
bash download.shThere are two model sizes, so we can perform experiments with MODEL_SIZE=base|large.
cd sumgen
bash run.sh GPU_ID [MODEL_SIZE]cd plbart
bash train.sh GPU_ID [MODEL_SIZE]cd sumgen/evaluation
bash decode.sh GPU_ID SOURCE TARGET MODEL_SIZE BEAM_SIZE
bash evaluate.sh SAVE_DIR SOURCE TARGETFor example, run the following commands to get results with default settings.
cd sumgen/evaluation
# to evaluate base model
bash decode.sh 0 java python base 10
bash evaluate.sh base_java_python_b10 java python
# to evaluate large model
bash decode.sh 0 java python large 10
bash evaluate.sh large_java_python_b10 java pythoncd scripts
bash run.sh GPU_IDContents of this repository is under the MIT license. The license applies to the pre-trained and fine-tuned models as well.
If you use any of the datasets, models or code modules, please cite the following paper:
@article{ahmad2022sumgen,
author = {Wasi Uddin Ahmad and Saikat Chakraborty and Baishakhi Ray and Kai-Wei Chang},
title = {Summarize and Generate to Back-translate: Unsupervised Translation of Programming Languages},
journal = {CoRR},
volume = {abs/2205.11116},
year = {2022},
url = {https://arxiv.org/abs/2205.11116},
eprinttype = {arXiv},
eprint = {2205.11116}
}